Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            Free, publicly-accessible full text available August 1, 2026
- 
            Cluster randomized trials (CRTs) are commonly used to evaluate the causal effects of educational interventions, where the entire clusters (e.g., schools) are randomly assigned to treatment or control conditions. This study introduces statistical methods for designing and analyzing two-level (e.g., students nested within schools) and three-level (e.g., students nested within classrooms nested within schools) CRTs. Specifically, we utilize hierarchical linear models (HLMs) to account for the dependency of the intervention participants within the same clusters, estimating the average treatment effects (ATEs) of educational interventions and other effects of interest (e.g., moderator and mediator effects). We demonstrate methods and tools for sample size planning and statistical power analysis. Additionally, we discuss common challenges and potential solutions in the design and analysis phases, including the effects of omitting one level of clustering, non-compliance, threats to external validity, and cost-effectiveness of the intervention. We conclude with some practical suggestions for CRT design and analysis, along with recommendations for further readings.more » « less
- 
            Cost-effectiveness analysis studies in education often prioritize descriptive statistics of cost-effectiveness measures, such as the point estimate of the incremental cost-effectiveness ratio (ICER), while neglecting inferential statistics like confidence intervals (CIs). Without CIs, it becomes impossible to make meaningful comparisons of alternative educational strategies, as there is no basis for assessing the uncertainty of point estimates or the plausible range of ICERs. This study is designed to evaluate the relative performance of five methods of constructing CIs for ICERs in randomized controlled trials with cost-effectiveness analyses. We found that the Monte Carlo interval method based on summary statistics consistently performed well regarding coverage, width, and symmetry. It yielded estimates comparable to the percentile bootstrap method across multiple scenarios. In contrast, Fieller’s method did not work well with small sample sizes and treatment effects. Further, Taylor’s method and the Box method performed least well. We discussed two-sided and one-sided hypothesis testing based on ICER CIs, developed tools for calculating these ICER CIs, and demonstrated the calculation using an empirical example. We concluded with suggestions for applications and extensions of this work.more » « less
- 
            This study introduces recent advances in statistical power analysis methods and tools for designing and analyzing randomized cost-effectiveness trials (RCETs) to evaluate the causal effects and costs of social work interventions. The article focuses on two-level designs, where, for example, students are nested within schools, with interventions applied either at the school level (cluster design) or student level (multisite design). We explore three statistical modeling strategies—random-effects, constant-effects, and fixed-effects models—to assess the cost-effectiveness of interventions, and we develop corresponding power analysis methods and tools. Power is influenced by effect size, sample sizes, and design parameters. We developed a user-friendly tool, PowerUp!-CEA, to aid researchers in planning RCETs. When designing RCETs, it is crucial to consider cost variance, its nested effects, and the covariance between effectiveness and cost data, as neglecting these factors may lead to underestimated power.more » « less
- 
            Extant literature on moderation effects narrowly focuses on the average moderated treatment effect across the entire sample (AMTE). Missing is the average moderated treatment effect on the treated (AMTT) and other targeted subgroups (AMTS). Much like the average treatment effect on the treated (ATT) for main effects, the AMTS changes the target of inferences from the entire sample to targeted subgroups. Relative to the AMTE, the AMTS is identified under weaker assumptions and often captures more policy-relevant effects. We present a theoretical framework that introduces the AMTS under the potential outcomes framework and delineates the assumptions for causal identification. We then propose a generalized propensity score method as a tool to estimate the AMTS using weights derived with Bayes Theorem. We illustrate the results and differences among the estimands using data from the Early Childhood Longitudinal Study. We conclude with suggestions for future research.more » « less
- 
            Past research has demonstrated that treatment effects frequently vary across sites (e.g., schools) and that such variation can be explained by site-level or individual-level variables (e.g., school size or gender). The purpose of this study is to develop a statistical framework and tools for the effective and efficient design of multisite randomized trials (MRTs) probing moderated treatment effects. The framework considers three core facets of such designs: (a) Level 1 and Level 2 moderators, (b) random and nonrandomly varying slopes (coefficients) of the treatment variable and its interaction terms with the moderators, and (c) binary and continuous moderators. We validate the formulas for calculating statistical power and the minimum detectable effect size difference with simulations, probe its sensitivity to model assumptions, execute the formulas in accessible software, demonstrate an application, and provide suggestions in designing MRTs probing moderated treatment effects.more » « less
- 
            Background:Evaluation studies frequently draw on fallible outcomes that contain significant measurement error. Ignoring outcome measurement error in the planning stages can undermine the sufficiency and efficiency of an otherwise well-designed study and can further constrain the evidence studies bring to bear on the effectiveness of programs. Objectives:We develop simple formulas to adjust statistical power, minimum detectable effect (MDE), and optimal sample allocation formulas for two-level cluster- and multisite-randomized designs when the outcome is subject to measurement error. Results:The resulting adjusted formulas suggest that outcome measurement error typically amplifies treatment effect uncertainty, reduces power, increases the MDE, and undermines the efficiency of conventional optimal sampling schemes. Therefore, achieving adequate power for a given effect size will typically demand increased sample sizes when considering fallible outcomes, while maintaining design efficiency will require increasing portions of a budget be applied toward sampling a larger number of individuals within clusters. We illustrate evaluation planning with the new formulas while comparing them to conventional formulas using hypothetical examples based on recent empirical studies. To encourage adoption of the new formulas, we implement them in the R package PowerUpR and in the PowerUp software.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
